Multi-phase Word Sense Embedding Learning Using a Corpus and a Lexical Ontology

نویسندگان

  • Qi Li
  • Tianshi Li
  • Baobao Chang
چکیده

Word embeddings play a significant role in many modern NLP systems. However, most prevalent word embedding learning methods learn one representation per word which is problematic for polysemous words and homonymous words. To address this problem, we propose a multi-phase word sense embedding learning method which utilizes both a corpus and a lexical ontology to learn one embedding per word sense. We use word sense definitions and relations between word senses defined in a lexical ontology in a different way from existing systems. Experimental results on word similarity task show that our approach produces word sense embeddings of high quality.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enriching Ontology Concepts Based on Texts from WWW and Corpus

In spite of the growing of ontological engineering tools, ontology knowledge acquisition remains a highly manual, time-consuming and complex task. Automatic ontology learning is a well-established research field whose goal is to support the semi-automatic construction of ontologies starting from available digital resources (e.g., A corpus, web pages, dictionaries, semi-structured and structured...

متن کامل

Developing a Corpus-Based Word List in Pharmacy Research ‎Articles: A Focus on Academic Culture

The present corpus-based lexical study reports the development of a Pharmacy Academic Word List (PAWL); a list of the most frequent words from a corpus of 3,458,445 tokens made up of 800 most recent pharmacy texts including research articles, review articles, and short communications in four sub-disciplines of pharmacy. WordSmith (Scott, 2017) and AntWordProfiler (Anthony, 2014) were used to sc...

متن کامل

context2vec: Learning Generic Context Embedding with Bidirectional LSTM

Context representations are central to various NLP tasks, such as word sense disambiguation, named entity recognition, coreference resolution, and many more. In this work we present a neural model for efficiently learning a generic context embedding function from large corpora, using bidirectional LSTM. With a very simple application of our context representations, we manage to surpass or nearl...

متن کامل

Lexical Bundles in English Abstracts of Research Articles Written by Iranian Scholars: Examples from Humanities

This paper investigates a special type of recurrent expressions, lexical bundles, defined as a sequence of three or more words that co-occur frequently in a particular register (Biber et al., 1999). Considering the importance of this group of multi-word sequences in academic prose, this study explores the forms and syntactic structures of three- and four-word bundles in English abstracts writte...

متن کامل

Enriching a lexical semantic net with selectional preferences by means of statistical corpus analysis

Broad-coverage ontologies which represent lexical semantic knowledge are being built for more and more natural languages. Such resources provide very useful information for word sense disambiguation, which is crucial for a variety of NLP tasks (e.g. semantic annotation of corpora, information retrieval, or semantic inferencing). Since the manual encoding of such ontologies is very labour-intens...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1606.04835  شماره 

صفحات  -

تاریخ انتشار 2016